Decision tree-based triphones are robust and practical for mandarian speech recognition
نویسندگان
چکیده
In large-vocabulary, speaker-independent speech recognition systems, modeling of vocabulary words by subword units is mandatory. This paper studies the use of triphone units for Mandarin speech recognition compared to biphone and context-independent phonetic units. In order to solve unseen triphones in speech recognition, decision-tree based clustering is used in triphone units. This method achieves high recognition performance with limited training data and also reduces the model training time. The robustness and effectiveness of the cross-word, treebased triphone units have been proved by the speakerindependent continuous Mandarin speech recognition task. The training computation time reduces by about 2.3 times after tying states for triphone models, the recognition syllable accuracy increases 28.7% compared to monophone units and by 13.5% compared to biphone units.
منابع مشابه
Predicting unseen triphones with senones
In large-vocabulary speech recognition, the decoder often encounters triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decision-tree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of e...
متن کاملPhonetic Question Generation Using Misrecognition
Most automatic speech recognition systems are currently based on tied state triphones. These tied states are usually determined by a decision tree. Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently. In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic ...
متن کاملPredicting Unseen Triphones with Senones - Speech and Audio Processing, IEEE Transactions on
In large-vocabulary speech recognition, we often encounter triphones that are not covered in the training data. These unseen triphones are usually backed off to their corresponding diphones or context-independent phones, which contain less context yet have plenty of training examples. In this paper, we propose to use decision-tree-based senones to generate needed senonic baseforms for these uns...
متن کاملCluster adaptive training with factorized decision trees for speech recognition
Cluster adaptive training (CAT) is a popular approach to train multiple-cluster HMMs for fast speaker adaptation in speech recognition. Traditionally, a cluster-independent decision tree is shared among all clusters, which could limit the modelling power of multiple-cluster HMMs. In this paper, each cluster is allowed to have its own decision tree. The intersections between the triphones subset...
متن کاملTriphone tying techniques combining a-priori rules and data driven methods
Tying of Hidden Markov Model states is an important issue for the use of triphones as modeling units in automatic speech recognition systems. This paper studies the application of a–priori rules for tying in combination with data driven methods. The baseline method features a combination of a–priori rules that reduce the theoretical number of units by an oder of magnitude and a simple back–off ...
متن کامل